Soc 722

Logistics

Schedule

We will normally meet Tuesdays. Please note that we will be meeting on the following Thursdays (not Tuesdays) because of my travel schedule:

  • today
  • January 30
  • February 6
  • February 13
  • March 20

Final exam

The exam will be remote on April 30 from 9am-12pm.

Questions?

Introduction

Objective

This is the first in a two-course sequence designed to help you become competent quantitative researchers in sociology.

This includes learning proper decision making, explanation, computation, visualization, and interpretation.

Data and Variables

Data structure

country continent year lifeExp pop gdpPercap
Afghanistan Asia 1952 28.801 8425333 779.4453
Afghanistan Asia 1957 30.332 9240934 820.8530
Afghanistan Asia 1962 31.997 10267083 853.1007
Afghanistan Asia 1967 34.020 11537966 836.1971
Afghanistan Asia 1972 36.088 13079460 739.9811
Afghanistan Asia 1977 38.438 14880372 786.1134
Afghanistan Asia 1982 39.854 12881816 978.0114
Afghanistan Asia 1987 40.822 13867957 852.3959
Afghanistan Asia 1992 41.674 16317921 649.3414
Afghanistan Asia 1997 41.763 22227415 635.3414

Tidy format: columns contain variables, each row is an observation.

Untidy data

country continent lifeExp_1952 lifeExp_1957 lifeExp_1962 lifeExp_1967 lifeExp_1972 lifeExp_1977 lifeExp_1982 lifeExp_1987 lifeExp_1992 lifeExp_1997 lifeExp_2002 lifeExp_2007 pop_1952 pop_1957 pop_1962 pop_1967 pop_1972 pop_1977 pop_1982 pop_1987 pop_1992 pop_1997 pop_2002 pop_2007 gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007
Afghanistan Asia 28.801 30.332 31.997 34.020 36.088 38.438 39.854 40.822 41.674 41.763 42.129 43.828 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 25268405 31889923 779.4453 820.8530 853.1007 836.1971 739.9811 786.1134 978.0114 852.3959 649.3414 635.3414 726.7341 974.5803
Albania Europe 55.230 59.280 64.820 66.220 67.690 68.930 70.420 72.000 71.581 72.950 75.651 76.423 1282697 1476505 1728137 1984060 2263554 2509048 2780097 3075321 3326498 3428038 3508512 3600523 1601.0561 1942.2842 2312.8890 2760.1969 3313.4222 3533.0039 3630.8807 3738.9327 2497.4379 3193.0546 4604.2117 5937.0295
Algeria Africa 43.077 45.685 48.303 51.407 54.518 58.014 61.368 65.799 67.744 69.152 70.994 72.301 9279525 10270856 11000948 12760499 14760787 17152804 20033753 23254956 26298373 29072015 31287142 33333216 2449.0082 3013.9760 2550.8169 3246.9918 4182.6638 4910.4168 5745.1602 5681.3585 5023.2166 4797.2951 5288.0404 6223.3675
Angola Africa 30.015 31.999 34.000 35.985 37.928 39.483 39.942 39.906 40.647 40.963 41.003 42.731 4232095 4561361 4826015 5247469 5894858 6162675 7016384 7874230 8735988 9875024 10866106 12420476 3520.6103 3827.9405 4269.2767 5522.7764 5473.2880 3008.6474 2756.9537 2430.2083 2627.8457 2277.1409 2773.2873 4797.2313
Argentina Americas 62.485 64.399 65.142 65.634 67.065 68.481 69.942 70.774 71.868 73.275 74.340 75.320 17876956 19610538 21283783 22934225 24779799 26983828 29341374 31620918 33958947 36203463 38331121 40301927 5911.3151 6856.8562 7133.1660 8052.9530 9443.0385 10079.0267 8997.8974 9139.6714 9308.4187 10967.2820 8797.6407 12779.3796
Australia Oceania 69.120 70.330 70.930 71.100 71.930 73.490 74.740 76.320 77.560 78.830 80.370 81.235 8691212 9712569 10794968 11872264 13177000 14074100 15184200 16257249 17481977 18565243 19546792 20434176 10039.5956 10949.6496 12217.2269 14526.1246 16788.6295 18334.1975 19477.0093 21888.8890 23424.7668 26997.9366 30687.7547 34435.3674
Austria Europe 66.800 67.480 69.540 70.140 70.630 72.170 73.180 74.940 76.040 77.510 78.980 79.829 6927772 6965860 7129864 7376998 7544201 7568430 7574613 7578903 7914969 8069876 8148312 8199783 6137.0765 8842.5980 10750.7211 12834.6024 16661.6256 19749.4223 21597.0836 23687.8261 27042.0187 29095.9207 32417.6077 36126.4927
Bahrain Asia 50.939 53.832 56.923 59.923 63.300 65.593 69.052 70.750 72.601 73.925 74.795 75.635 120447 138655 171863 202182 230800 297410 377967 454612 529491 598561 656397 708573 9867.0848 11635.7995 12753.2751 14804.6727 18268.6584 19340.1020 19211.1473 18524.0241 19035.5792 20292.0168 23403.5593 29796.0483
Bangladesh Asia 37.484 39.348 41.216 43.453 45.252 46.923 50.009 52.819 56.018 59.412 62.013 64.062 46886859 51365468 56839289 62821884 70759295 80428306 93074406 103764241 113704579 123315288 135656790 150448339 684.2442 661.6375 686.3416 721.1861 630.2336 659.8772 676.9819 751.9794 837.8102 972.7700 1136.3904 1391.2538
Belgium Europe 68.000 69.240 70.250 70.940 71.440 72.800 73.930 75.350 76.460 77.530 78.320 79.441 8730405 8989111 9218400 9556500 9709100 9821800 9856303 9870200 10045622 10199787 10311970 10392226 8343.1051 9714.9606 10991.2068 13149.0412 16672.1436 19117.9745 20979.8459 22525.5631 25575.5707 27561.1966 30485.8838 33692.6051
Benin Africa 38.223 40.358 42.618 44.885 47.014 49.190 50.904 52.337 53.919 54.777 54.406 56.728 1738315 1925173 2151895 2427334 2761407 3168267 3641603 4243788 4981671 6066080 7026113 8078314 1062.7522 959.6011 949.4991 1035.8314 1085.7969 1029.1613 1277.8976 1225.8560 1191.2077 1232.9753 1372.8779 1441.2849
Bolivia Americas 40.414 41.890 43.428 45.032 46.714 50.023 53.859 57.251 59.957 62.050 63.883 65.554 2883315 3211738 3593918 4040665 4565872 5079716 5642224 6156369 6893451 7693188 8445134 9119152 2677.3263 2127.6863 2180.9725 2586.8861 2980.3313 3548.0978 3156.5105 2753.6915 2961.6997 3326.1432 3413.2627 3822.1371

Types of variables

Ratio dollars; points (e.g., basketball)
Interval degrees Celsius
Ordinal clothing sizes; Likert scales
Nominal race; sex; country

The first two types are continuous or numeric. The second two types are categorical. Ordinal variables are often treated as numeric and this is usually fine.

Let’s investigate this using the gapminder data. First of all, we’ll keep only the most recent (2007) data.

d <- gapminder |>               
  filter(year == max(year)) |> # keep 2007
  select(-year)                # don't need column

country continent lifeExp pop gdpPercap
Afghanistan Asia 43.828 31889923 974.5803
Albania Europe 76.423 3600523 5937.0295
Algeria Africa 72.301 33333216 6223.3675
Angola Africa 42.731 12420476 4797.2313
Argentina Americas 75.320 40301927 12779.3796
Australia Oceania 81.235 20434176 34435.3674
Austria Europe 79.829 8199783 36126.4927
Bahrain Asia 75.635 708573 29796.0483
Bangladesh Asia 64.062 150448339 1391.2538
Belgium Europe 79.441 10392226 33692.6051

What kinds of variables are these?

The origins of “statistics”

The word statistics comes from the fact that it was information about the state. We’ll focus on information like this for now rather than thinking about samples of individuals.

Visualization basics

Consider two types of plots

  • univariate plots

  • bivariate plots

These are also types of distributions.

Univariate plots

# density plot
ggplot(d,
       aes(x = gdpPercap)) +
  geom_density()

# histogram
ggplot(d,
       aes(x = gdpPercap)) +
  geom_histogram(binwidth = 5000,
                 boundary = 0,
                 color = "white")

# histogram
ggplot(d,
       aes(x = lifeExp)) +
  geom_histogram(binwidth = 5,
                 boundary = 0,
                 color = "white")

# bar graph
ggplot(d,
       aes(x = continent)) +
  geom_bar()

Bivariate plots

# scatterplot
ggplot(d,
       aes(x = gdpPercap,
           y = lifeExp)) +
         geom_point()

# bar graph (bivariate)
d |> 
  group_by(continent) |> 
  summarize(GDP = mean(gdpPercap)) |>
  ggplot(aes(x = continent,
             y = GDP)) +
  geom_bar(stat = "identity")

# "strip plot"
ggplot(d,
       aes(x = gdpPercap,
           y = continent)) +
  geom_point(alpha = .3)

# "strip plot"
ggplot(d,
       aes(x = gdpPercap,
           y = continent)) +
  geom_jitter(height = .1,
              width = .1,
              alpha = .2)

Time plots (bivariate)

Let’s go back to the full data and look at trends in Oceania.

oceania_plot <- gapminder |> 
  filter(continent == "Oceania") |> 
  ggplot(aes(x = year,
             y = lifeExp,
             group = country,
             color = country)) +
  geom_line()

oceania_plot

Your turn!